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We propose models and a method to qualitatively explain the receptive field properties 
of complex cells in the primary visual cortex. We apply a learning method based on the 
information maximization principle in a feedforward network, which comprises an input 
layer of image patches, simple cell-like first-output-layer neurons, and second-output-layer 
neurons (Model 1 ). The information maximization results in the emergence of the complex 
cell-like receptive field properties in the second-output-layer neurons. After learning, 
second-output-layer neurons receive connection weights having the same size from 
two first-output-layer neurons with sign-inverted receptive fields. The second-output-layer 
neurons replicate the phase invariance and iso-orientation suppression. Furthermore, on 
the basis of these results, we examine a simplified model showing the emergence of 
complex cell-like receptive fields (Model 2). We show that after learning, the output 
neurons of this model exhibit iso-orientation suppression, cross-orientation facilitation, 
and end stopping, which are similar to those found in complex cells. These properties of 
model neurons suggest that complex cells in the primary visual cortex become selective 
to features composed of edges to increase the variability of the output. 

Keywords: information maximization principle, complex cell, primary visual cortex, extraclassical receptive field, 
computational model 



1. INTRODUCTION 

A fundamental question that is often raised in neuroscience is 
how to determine the principle that underlies neural information 
coding in the brain. In terms of sensory information processing, 
this question can be answered by explaining how sensory neurons 
acquire their selectivity to inputs. The primary visual cortex (VI) 
is an ideal subject for this type of investigation because experi- 
mental results regarding the receptive field properties, single-cell 
electrophysiology, and topographic selectivity map are accumu- 
lated in VI. These experimental results allow us to screen the 
proposed principles of neural information coding by comparing 
the behavior of the model on the basis of each principle with 
the receptive field properties of neurons in VI. This screening 
provides us with a way to deal with the general principle of neu- 
ral information coding in the cerebral cortex. Several principles 
having similar mathematical structures, such as the information 
maximization principle (Linsker, 1988; Bell and Sejnowski, 1997) 
and sparse coding hypothesis (Barlow, 1959; Olshausen and Field, 
1996), were proposed to explain the receptive field properties of 
simple cells in VI. The statistical independence of the output neu- 
rons is the most essential assumption of these models. In the 
framework of independent component analysis (ICA), the out- 
put neurons acquire selectivity to stimuli such that the outputs 
of these neurons are uncorrelated and as statistically indepen- 
dent as possible. The statistical independence is closely related to 
the information maximization principle and sparse coding. If the 
output neurons are statistically dependent, the amount of infor- 
mation conveyed by these neurons reduces because many of them 



share the same information. Thus, maximizing the amount of 
information conveyed by the output neurons gives a result sim- 
ilar to that obtained when increasing the statistical independence 
of the output neurons. If the activity of neurons is indepen- 
dent and uncorrelated, the number of neurons firing simulta- 
neously decreases, and therefore, the neuronal activity becomes 
sparse. 

Since publication of the groundbreaking work by Hubel and 
Wiesel (1959), it is widely accepted that simple cells have wavelet- 
like receptive fields, which respond when a wavelet or an edge 
is positioned appropriately. The ICA models have revealed that 
ICA of natural images generates output units with simple cell- 
like receptive field properties. These results suggest that the 
assumption of the statistical independence of output neurons is 
a promising principle of neural information coding. However, 
the ability of this principle to explain the receptive field prop- 
erties of complex cells has not been addressed. In contrast to 
simple cells, the response of complex cells is predominantly deter- 
mined by the orientation of gratings and edges, and these cells are 
less sensitive to the positions of edges (Hubel and Wiesel, 1962). 
Thus, it is assumed that complex cells respond to abstract orien- 
tation and that the shift-invariant representation of the stimuli 
in the visual field is accomplished by complex cells. However, 
recent studies reveal that complex cells are not simple orientation 
detectors or shift-invariance detectors, and they exhibit surround 
suppression and facilitation by the gratings outside their classical 
receptive fields (Jones et al., 2001, 2002). Superimposing gratings 
perpendicular to the preferred orientation in the receptive field of 
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complex cells suppresses the firing of these cells (Bonds, 1989). 
These results suggest that the outputs of complex cells are not 
a simple pooling of simple cell inputs with similar orientation 
preferences. Using learning principles that can explain the emer- 
gence of these complex properties, we will be able to approach 
the general principle of neural information coding. Conversely, 
a general principle may shed light on what characterizes the fea- 
tures detected by complex cells. A number of theoretical studies 
were reported to explain the properties of complex cells (Foldiak, 
1991; Hyvarinen and Hoyer, 2001; Berkes and Wiskott, 2005; 
Karklin and Lewicki, 2005, 2009; Shan et al, 2007). Extraclassical 
receptive fields and complicated receptive field properties were 
replicated by these studies. However, these models assume that 
the complex cell-like units receive inputs whose magnitude does 
not depend on the polarity of the input image pixels (black 
and white). This type of input facilitates the emergence of the 
shift-invariant complex cell-like units; however, this assumption 
should be justified by a general principle. 

In this paper, we show that this assumption is justified by using 
the information maximization principle. In addition, we show 
that the information maximization principle explains the recep- 
tive field properties of complex cells. The differential entropy of 
the output was used as the measure of information transmitted 
from the input to the output in the previous ICA models (Bell 
and Sejnowski, 1995; Shriki et al, 2001). In the Methods sec- 
tion, we introduce mutual information, entropy, and the models 
we propose in this paper. In the first part of the Results section, 
we use a three-layer feedforward network comprising an input 
layer of natural image patches, simple cell-like first-output-layer 
neurons, and second-output-layer neurons that receive inputs 
from the first-output-layer neurons (Model 1). Our simulation 
results obtained using a learning rule that maximizes informa- 
tion transmission show that second-output-layer neurons that 
receive inputs from first-output-layer neurons with rectifying 
nonlinearity become shift invariant after the learning process. 
Some second-output-layer neurons exhibit surround suppres- 
sion similar to that reported for complex cells. A theoretical 
calculation based on the fact that the edges detected by simple 
cells are almost statistically independent proves that the phase 
insensitivity of complex cells maximizes the output entropy of 
VI. Model 1 contains realistic first-output-layer neurons with 
non-negative-defmite firing rates. However, Model 1 is compu- 
tationally expensive so we develop a simplified model of complex 
cells and more precisely examine the properties of model neu- 
rons. In Model 2, the output neurons receive the absolute value 
of the simple cell-like neurons as inputs. This simplification 
reduces the computational load and accelerates the convergence. 
In the Discussion section, we compare our proposed models with 
previous models and discuss the biological implications of our 
models. 

2. METHODS 

2.1. INFORMATION MAXIMIZATION PRINCIPLE 

Information transmitted from input x to output y is measured by 
the mutual information 

/(x;y)=H(x) + H(y)-H(x,y), 



where H(x), H(y), and H(x, y) are the entropies of the input, 
the output, and combination of the input and the output. Bell 
and Sejnowski (1995) showed that the amount of information 
transmitted from the input layer to the output layer equals the 
output entropy plus a constant dependent only on the proba- 
bility distribution of the input. The mutual information can be 
given by 

/(x;y) = H(y)-H(y|x), 

where H(y|x) is the entropy of the noise. In the noiseless sys- 
tem, the maximization of the mutual information of input 
and output can be achieved by the maximization of output 
entropy. Similar to Bell and Sejnowski (1995), we ignore the 
intrinsic noise of neurons although it is an important fac- 
tor contributing to the variability of neuronal firings. We can 
ignore the effect of the intrinsic noise because qualitatively sim- 
ilar results are obtained by ICA models when we add a small 
noise to the input and the output of first-output-layer neurons. 
Thus, by maximizing the output entropy, we expect to have an 
information-efficient representation of the input in the output 
layer. 

Entropy is a measure of the variability of a random variable, 
and is defined by 

H(x) = - J p(x)logp(x)dx, (1) 

where p(x) is the probability density function of the random 
variable x. Taking the logarithm to base 2, we have the entropy 
measured in bits. However, in the following sections, we take 
logarithms to base e because this simplifies the analytical treat- 
ment. The entropy of the univariate normal distribution p{x) = 

-j= | exp ^— ) > where u, is the mean and cr is the standard 

deviation, is log V2jtea 2 ; the entropy is a monotonically increas- 
ing function of the standard deviation. The uniform distribution 

{r^— a < x < b 
b-a - - (2) 
0 otherwise 

has the largest entropy, log(£> — a), among the probability distri- 
butions whose domain is between a and b (Cover and Thomas, 
2006). In other words, the uniform distribution is the probability 
distribution with the highest variability. This means that, under 
the assumption of rate coding, the entropy of the output of a neu- 
ron is large if the maximal firing rate is large and if the firing rate 
over time is uniformly distributed. The entropy of the n-variate 
normal distribution 

p(x)= — = 4==a P f-i(x- l i) r 5:- 1 (x-|JL)) (3) 
7(271)" detE V 2 J 

is log V(27te)" detX, where u, is the mean vector and Z is 
the covariance matrix. The normal distributions with the mean 
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vectors 



H-l = |X 2 



and the covariance matrices 
El 

and 



GO 



i 1/V2 
1/V2 1 



(4) 



(5) 



(6) 



have the same marginal distributions, i.e., x\ and %2 obey the stan- 
dard normal distribution in both cases. Thus, the entropies of the 
marginalized x\ and x 2 assume the same value for both distribu- 
tions. However, the joint entropy of the joint distribution of X\ 
and X2 with T, 1 , log 2 (2ne) ~ 4.09 bits, is greater than that of the 
joint distribution with E2, log 2 \l (2ne) 2 /2 3.59 bits, because 
the variables x\ and X2 are independent in the former and corre- 
lated in the latter. Entropy is maximized if the random variables 
are independent of each other. We can see that the joint entropy 
of the output of the neurons takes a large value if these neurons 
fire vigorously and independently. Entropy can therefore be used 
as a measure of the output variability of a population of neurons. 
The entropy of the output of neurons in the sensory areas is large 
if their population firing pattern varies with the input, and it is 
small if the firing pattern is less affected by the change of input. 
The output of sensory neurons with large output entropy can be 
easily utilized by higher sensory areas because different inputs are 
well separated in the space of the output firing patterns. For this 
reason, entropy is used as an objective function to train the mod- 
els of sensory information processing. Another reason for using 
entropy as an objective function is that entropy is a measure of 



sensitivity. If the input vector x is transformed to the output vec- 
tor y having the same number of elements, the entropy of the 
output is given by 



H(y) = j p(x) log det Jdx + H{x) , 



(7) 



where Jij = j£ is the Jacobian matrix of the transform. Because 
the (;,;') entry of the Jacobian matrix is the sensitivity of out- 
put i to the change of input j, the maximization of the output 
entropy H(y) can be regarded as the maximization of the sensi- 
tivity of output neurons. It is desirable that sensory neurons are 
maximally sensitive to the change of the stimuli. Note that det J 
cannot always take an arbitrary large value when we use bounded 
functions, such as in Equation 13, as the activation function of 
output units. 

2.2. MODEL 1 

We assume that the system has an AT-dimensional input vec- 
tor x and output comprising a 2N-dimensional first-output-layer 
neuron vector y=[y+,y~] and an N-dimensional second- 
output-layer neuron vector z (Figure 1A). The output of the 
first-output-layer neurons is a deterministic function of input x 
and is described by 



+ =R(ui) 



and 



where 



Yi 



y\ =R(-Ui). 



Hi = f (a,) 



(8) 



(9) 



(10) 




FIGURE 1 I Schematic representation of the models. (A) Joint entropy 
of the first-output-layer y + and y and second-output-layer z is maximized 
in Model 1. V is the connection weight matrix between W = 4 input 
neurons and 2W = 8 first-output-layer neurons, and W is the connection 
weight matrix between the first-output-layer neurons and N = 4 
second-output-layer neurons. The first-output-layer neurons y+ take on 
values greater than zero if u, > 0, and zero otherwise, whereas y~ take on 
values greater than zero if u, < 0, and zero otherwise. Black and gray 



arrows indicate modifiable and fixed connections, respectively. Gray filled 
circles and open circles are units with and without a nonlinear activation 
function f(x), respectively. (B1) Entropy of the outputs u and z is 
maximized in Model 2. Output z, is a function of the linear superposition 
of inputs yf and y~ by the connection weight matrices W + and W~, 
respectively. (B2) Model 2 is equivalent to the maximization of the entropy 
of the output z, which is a function of the linear superposition of input |u,i 
by the connection weight matrix W. 
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and 



E 

1 <j<N 



VyXj. 



where AW^(t) and Afr;(t) are defined by Equations 29 and 32, 
respectively, and 6 is the learning rate. The update was performed 
(11) 1.63 x 10 6 times with e = 1(T 4 and then 3 x 10 4 times with t = 
1(T 5 . 



Here, 



R(x) 



x x > 0 
0 x < 0 



(12) 



is the ramp function, and 

f(x) = 2 arctan tanh ■ 



(13) 



is the activation function used in this paper. This activa- 
tion function gives the same algorithm as the ICA algo- 
rithm of "tanh nonlinearity" (Hyvarinen et al., 2001) because 
f"(x)/f'(x) = — tanhx. This activation function corresponds 
to the assumption that the independent components follow 
a sparse distribution p(x) = l/(jt cosh x) in terms of max- 
imal likelihood. Different activation functions corresponding 
to dense distributions such as that satisfying f"(x)/f'(x) = 
—x 3 cannot predict the receptive field properties of simple 
cells. Any decomposition matrix of natural images obtained 
using ICA algorithms can be used as matrix V = (Vx). In all 
simulations of this paper, we use the decomposition matrix 
obtained by the method described in Section 2.4. The second- 
output-layer output z is a function of the first-output-layer 
output y: 



2.3. ENTROPY OF OUTPUT AND DERIVATION OF THE LEARNING RULE 
OF MODEL 1 

Here we describe the objective function, the entropy of the output 
to be maximized, and the derivation of the learning algorithm. 
First, we define the joint entropy of the first- and second-output- 
layer neurons, and then we calculate the derivative of the joint 
entropy with respect to the connection weights. In our model, the 
N-dimensional input vector gives rise to a 2N-dimensional first- 
output-layer vector and an N-dimensional second-output-layer 
vector. The representation is overcomplete because there are more 
output components than input components. Shriki et al. (2001) 
considered the maximization of the entropy of the overcomplete 
representation of the input. Although their model does not con- 
tain a multilayer structure and rectifying nonlinearity, the same 
idea can be applied to Model 1. The probability density of y and z 
is related to the input distribution and is given by the relation 



P(y, z) oc 



P(x) 



Vdet(x r x) ' 

where the susceptibility matrix x € R 3NxN is defined as 



(17) 



(18) 



*=/ \-hi+ J2 w t{yf-~yt)+ E w ^0f-^) l (14) 



E 



AW ± (f) 



(15) 



h + € 2^ Ah(f), 

1<« 100 



(16) 



Here we define 
,± 



where Wjt and are connection weights, h, is the offset of the 
second-output-layer neuron jy is the average of yj 1 ", and yj is 
the average of yj . The terms — y^ and —yj are introduced to 
accelerate the convergence. These terms do not affect the values 
of the parameters after convergence because the increase of fj to 

yf + Ayf is offset by the change of hi to hi - £ 1 < ; < N Ayf . 

To enable readers to replicate our results without the need to 
follow the details of the derivation of the algorithm, here we sum- 
marize the simulation process and describe how to evaluate the 
entropy of the output and derivation of the learning rule in the 
next section. We use a batch learning process to accelerate the 
simulation. The weights W^, Wy, and thresholds hi are updated an( j 
once every 100 steps using 

W ± <- W ± + € 

] 

and 



ay? 

9x 



±s(±Ui)f( ai )Vy, 
dx; 



1 < k< N 



1 <j<JV 



(19) 



■f(k) i w tk< u k) ~ Wr k s(-u k ))f'(a k )V kp (20) 



s(x) = 



-n 


)+£ 




1 <;<JV 


i 


x > 0 


i 


x = 0 , 


0 


x < 0 



(22) 



because £ c R(f(x))=s(x)f'(x) and f'(-x) =f'(x). We define 
s(0) = 1/V2 to simplify the equation. Here the dependence of 
Y and Z on the time step t is not explicitly shown. Thus, we 



Frontiers in Computational Neuroscience 



www.frontiersin.org 



November 2013 | Volume 7 | Article 165 | 4 



Tanaka and Nakamura 



Infomax explains complex cell-like selectivity 



maximize the entropy of the output 



input. Differentiating the first term with respect to W- , we obtain 



H(y z) = - J J P (y, z) logP (y, z) dyd: 
= ^E[logdet( X T x)]+H(x), 



(23) 



where E[-] indicates averaging over the input distribution and we 
used P(y, z)dydz = P(x)dx (Shriki et al., 2001). 

The first term of Equation 23 is given by the average of 

- logdet (x T x) = - logdet (y+ T Y+ + Y T Y + Z T Zj . (24) 
Here note that 

(Y+V+Y-V) = V ki (s(u k ) 2 f'(a k ) 2 

1 l<k<N 

+ s(-u k ) 2 f'(a k ) 2 )V k] 
= (v r diag(fW)v) ) , (25) 



where diag(<4) is a diagonal matrix whose diagonal entries are 
d\ , &i, . . . , d^. We also note that 



(Z r z) ; = (V r diag(f (a t ))C T Cdiag(f' (<ut))v) , (26) 



where 



'ij = f(bi) [w+s(uj) - W^s(-Mj)] . (27) 



C, 



From these relations we obtain 



H(y,z) = ilogdet(v T diag(/'(a,) 2 )V + V r diag(/'(fl,)) 
xC T Cdiag(f'(fl,))v)+H(x) 
= l - logdet (i + C r c) + lo B/'(^) + logdetV 

l<k<N 

+ H(x). (28) 



The second and third terms are the objective function used in the 
ICA model to generate the connectivity of the first-output-layer 
neurons (see Section 2.4). The fourth term is the entropy of the 



AW ± (t) 



9 1 



2 logdet(l + C r c) 



1 v [fi+c^r 1 ! —(c t c) 

2 V / , dW^ V Jim 

Kl.m<N L Jml"**„ 



= E R I+crc )"l E 



!</, m<JV 



dCk Lr 



± 



E f(i + c r c) 

1 < ', m <N 

c(l + C r c) 

(i + c T c) 



(±8jif(hM±ui) 



f(bM±Uj) 



where 



and 



fix) 



1 



coshx 



tanh x. 



(30) 



(31) 



Differentiating the first term of Equation 28 with respect to h;, we 
obtain 



Ahi(t) 



E [(*+ cTc )i , E 



1 < l, m < N 



l<k<N 



dCkl 



Ckrt 



1 < /, m < N 



E [(i+^c)- 1 ] e ** C *<S C * 



l<k<N 



:(i + c r c) < 



f"(bj) 



(32) 



Maximization of the entropy of the second output layer, H(z), can 
be achieved by replacing (I + C C) in the above learning rules 
with (C C) . Results similar to those presented in this paper are 
obtained by this modified learning rule, which is equivalent to 
Model 2 without the constraint = . 

2.4. NEWTON METHOD 

We used the Newton method for the ICA model proposed in 
Amari et al. (1997) and Palmer et al. (2008) to obtain the con- 
nection matrix from the input units to first-output-layer neurons 
and perform the simulation of Model 2. In general, the entropy 
of real data in ICA models is not a convex function and has mul- 
tiple local optima. Thus, the global optimum cannot be found 
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using the Newton method in most cases. However, it accelerates 
the convergence to one of the local optima. The following is the 
summary of the learning algorithm by Palmer et al. (2008). The 
entropy of the output is given by the expectation of 



logdetV+ £ log/' (yO, 



1< i < N 



where 



y,= E Vr 



(33) 



(34) 



1< i < N 



According to their results, the optimal direction AVy(f) at step f 
is given by 



AV(t) = B(f)V, 



(35) 



where 



Bl] = 1±zm^ + ^ii^Iim (1 _ 8y) , (36) 

1 + T); KiKjOjoj - 1 



K, = E 



f"(yif-f"(yi)f(yi) 



f'(y.) 2 



and 



r|«=E 



f"(y,) 2 -f m (y,)f(yi) 2 
f'(y,) 2 Ti . 



(37) 
(38) 

(39) 



where € is the learning rate. 

The decomposition matrix V was obtained using this method. 
We used randomly selected 20 x 20 pixels image patches 
from the images and converted the pixels in these image 
patches to 400-dimensional real-valued inputs, x. The mean 
of the pixels of the image was subtracted from each image. 
The input images were not prewhitened (Olshausen and Field, 
1997) because the input images without the prewhitening pro- 
cess yielded clearer results than the prewhitened ones. The 
receptive field properties of the second-output-layer neurons 
obtained using prewhitened images were qualitatively sim- 
ilar to those obtained using non-prewhitened images. The 
update of the matrix V using Equation 44 was performed 
10 5 times with 6 = 10~ 5 , 2.9 x 10 6 times with t = 10~ 4 , and 



9.7 x 10 6 times with t 



10~ 



The iteration of the learn- 



ing process of V must be sufficiently long as insufficient 
optimization disrupts the receptive field properties of output 
neurons. 

2.5. MODEL 2 

In Model 1, we decomposed N pixels into 2N first-output-layer 
values and N second-output-layer values. In Model 2, we decom- 
posed 2N input values from the rectified linear simple cell-like 
elements into N sign-dependent simple cell-like neurons u and 
N complex cell-like neurons z (Figure 1B1). In other words, we 
fixed the output of the first half of the neurons to which are 
the nonlinear transformations of independent components of the 
images, and vary the connection weights to the second half of 
the neurons to obtain complex cell-like properties. 2N inputs are 
given by y^ and yT and are defined by Equations 8 and 9, respec- 
tively. The outputs u and z are functions of y = [y + , y - ], and are 
defined as 



where 



f'\x) 2 -f"'{x)f(x) 
fix) 2 



1 



cosh x 



(40) and 



Here the dependence on t is not explicitly shown. For the online 
algorithm, we updated k,-, a 2 , and t); using 



K,(t) = Ki (t - 1) + 



1 



1 



t \ cosh 2 y,(t) 
1 



Ki(t - 1) , (41) 



and 



(t) = of (t - 1) + - (y t (t) 2 -a 2 (t - 1)) , 



1 ( y,(t) 2 

r)i(t) = T)i(t - 1) + - — TT~~T " " !) 



t \coslry,(f) 



(42) 



(43) 



at each time step. The integration time constant r was set to be 
10, 000 steps in all simulations. The weights V« were updated 
once every 100 steps using 



(44) 



i< t< too 



-y l -y l 



Zi = f (c«) , 



(45) 



(46) 



where 



E w t(yt-yt)+ E u » (n >•)■ ^ 



1 <;<N 



1<;<N 



We do not introduce the offset parameter /i, in this model because 
hi takes on values close to zero in the simulation of Model 1. Here 
we define 



W 



In —In 



(48) 



where In is an N-dimensional identity matrix. We fix the con- 
nection weights of the first half of the neurons, u, to simple 
cell-like receptive fields, and update W+ and W~ using an ICA 
algorithm. Because the covariance matrix of independent compo- 
nents is a diagonal matrix (Hyvarinen et al, 2001), W + and W~ 
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must be appropriately set to decorrelate the outputs. Below, we 
set W + = W = W, because assuming y^ = yj , E [V^y ; + ] = 

E^y^yrJ, and E j = ^ fVi"".)'/"]' a ^ °f which approxi- 

mately hold for independent components of natural scenes, the 
covariances E [uiq] - E [Ui] E [cj] vanish only if W+ = W . 
Then, the output z, is given by 



f\ E ™vH 

V 1< i < N 



(49) 



where \u,\ is the average of \uj\, which equals y^ +y- . If input 
yf is greater than 0, yf is equal to 0; the probability density 
P (yf> y7) is not a function with finite values, and therefore, ICA 
algorithms cannot be applied. However, assuming that p (y + , y7\ 
is a continuous function, the entropy of the output of this system 
is given by the expectation of 



logdetW+ log/'(c,)+H(y) 

1<!<N 

= logdetW + log/'(c ! )+Nlog2 + H(y), (50) 

1<><N 

where the last two terms do not depend on W. Hence, the maxi- 
mization of the first two terms is sufficient for the maximization 
of the output. This is equivalent to the simulation of the ICA 
model with the N-dimensional input |k;| — \tq\ (Figure 1B2). 
Note that ICA can be applied to it because the probability den- 
sity of p(\Ui\) takes finite values, and that the first two terms of 
Equation 50 equal the first term of Equation 28 if (I + C C) 
is replaced by (C T C) and Wjj" = = Wij. We perform the 
Newton method for N-dimensional inputs and outputs, which is 
much faster than the gradient descent of Model 1. The update of 
the matrix W obtained using Equation 44 was performed 1 x 10 5 
times with € = 10~ 6 and then 2.99 x 10 7 times with € = 10~ 5 . 

2.6. CHARACTERIZATION OF MODEL NEURONS 

We fit the connection weights to the first-output-layer neurons 
from the pixel at (i, j) with the Gabor function 



Aexp 



x 
'lal 



y 

2aj 



cos (foe — 4>) + B, 



(51) 



where 



x = (i — xq) cos 8 + (j — yo) sin 6, 
y = — (i — Xq) sin 6 + (j - y 0 ) cos 0 

with parameters A, B, xq, yo, Ox, a y , 6, ((), and k by using the gra- 
dient descent method (Figure 2). The sum of the square of the 
difference between the fitted function and the connection weights 
is less than 10% of the sum of the square of the connection 
weights for 398 out of 400 first-output-layer neurons. 
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FIGURE 2 | Representation of the connection weights. The connection 
weights from input image patches to the simple cell-like first-output-layer 
neurons are fitted by a Gabor function. The connection weights from the 
first-output layer neurons to second-output layer neurons are represented 
by bars in the second-output-layer neurons. The color and opacity of each 
bar indicate the sign (red indicates positive, i.e., the excitatory connections, 
and blue indicates negative, i.e., the inhibitory connections) and magnitude 
of the connection weight from the first-output-layer neuron, respectively. 
(A). In Model 1, the left box represents the connection weights from the 
first-output-layer neurons y+ through yjj, and the right box represents the 
connection weights from through yZ. (B). In Model 2, connection 
weights Wn are represented by bars in a box. 



The phase-dependent (Fl) to phase-invariant (F0) compo- 
nent ratio (F1/F0 ratio) has been used to characterize simple 
and complex cells (Shapley and Lennie, 1985; Skottun et al., 
1991). Simple cells are identified by the F1/F0 ratio greater 
than 1, and complex cells are identified by an F1/F0 ratio 
less than 1. We calculate the F1/F0 ratio of model neuron i 
by using 



(E* R feW) sin <>) + (£4, R fe(«|>)) cos <|>y 



(52) 



where 4> is the phase of the grating and Z<((j>) is the output 
of neuron i in response to the grating of the phase Under 
this definition, the F1/F0 ratio equals 2 for the simple cell-like 
case, z,((|>) = .R(sin c|>), and equals 0 for the perfectly phase- 
invariant case, z;((t>) = 1. We first choose the optimal grating 
for each neuron and obtain z,(ct>) by varying the phase of 
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the optimal grating. The optimal grating is chosen from the 
gratings with various radii (2, 3, 4, 5, and 6 pixels), center posi- 
tions (x, y=l,2, 20), orientations (0°, 20°, . . . , 340°), 
spatial frequencies (60°/pixel, 75°/pixel, . . . , 120°/pixel), 
and phases (0°, 20°, . . . , 340°). The center of the opti- 
mal grating is used as the center of the gratings when 
examining the receptive field properties of model neurons 
(Figures 4, 6). 

Neurons tend to adapt to static stimuli and decrease their fir- 
ing rates. Moving gratings are frequently used to evoke a large 
response in experiments. Because the response of this model 
does not depend on the previous stimuli, we use single sta- 
tionary stimuli as inputs. This does not necessarily mean that 
the results in this paper correspond to the experimental results 
obtained using stationary stimuli. We compare the responses of 
model neurons with experimental results obtained using moving 
gratings. 

3. RESULTS 

3.1. MODEL 1: INFORMATION MAXIMIZATION IN A THREE-LAYER 
FEEDFORWARD NETWORK 

Model 1 is a multilayer feedforward network (Figure 1A). This 
network contains N input units (x t ), 2N first-output-layer neu- 
rons (y + and v7), and N second-output-layer neurons (z,). We 
used randomly selected 20 pixels x 20 pixels image patches from 
natural photographs distributed by Prof. Bruno Olshausen on 
his homepage (Olshausen and Field, 1997) and converted the 
pixels in these image patches to N = 400 real-valued inputs. 
The input units correspond to the relay neurons in the lateral 
geniculate nucleus, and first- and second-output-layer neurons 
correspond to simple and complex cells in VI, respectively. The 
intensity of a pixel in the input images is represented by a real 
value in the present paper. In the preprocessing of images, we 
subtracted the mean pixel intensity from each image; the mean 
pixel intensity of an image patch is not necessarily zero. First, 
we performed the learning of the first-output-layer neurons, fol- 
lowed by the learning of the second-output-layer neurons. For 
first-output-layer neurons to acquire simple cell-like receptive 
field properties, we set the connection weights from the input 
units to first-output-layer neurons to the decomposition matrix 
V = (Vij) obtained from the linear ICA of natural image patches. 
A standard linear ICA algorithm decomposes the N-dimensional 
input vector x into an N-dimensional independent component 
vector whose elements can take both signs (Amari et al., 1996). 
Because the firing rate of a simple cell is not less than zero, we 
made two first-output-layer neurons y + and y~ from a non- 
linear transformation of independent component i; if u, > 0, 
we set y\ = and yj = 0, and if m, < 0, we set y\ = 0 and 
yj = —U{, where 

«*=/( I] v v*j) ( 53 ) 
\i<i<N / 

is the nonlinear transformation of independent component i 
by the sigmoidal activation function f(x). The first-output- 
layer neuron yj~ is selective to the sign-inversion of the image 



to which the first-output-layer neuron yj is selective, and 
vice versa. The output of the second-output-layer neuron i is 
defined by 

zi=f\-hi+ J2 w Pt+ £ ( 54 ) 

\ 1<]<N 1<J<N J 

where and are the connection weights from the first- 
output-layer neurons y^ and yj , respectively, and ft, is the 
threshold. We updated the connection weight matrices W + and 
W , and maximized the entropy of the outputs y + , y~ , and z, 
ff(y+y-,z). 

First, we examine the connection weights of the model neu- 
rons. Note that we do not impose the constraint = 
which makes second-output-layer neurons invariant to the sign 
inversion of input images. However, after the learning process, 
the connection weights and result in similar values 
(Figure 3A, p < 0.01, permutation test of Spearman's rank cor- 
relation coefficient). To visualize connection weights from first- 
output-layer neurons to second-output-layer neurons, we have 
to represent each first-output-layer neuron compactly. We used a 
Gabor function to fit the connection weights to the first-output- 
layer neurons, as shown in Figure 2A, and represent the fitted 
Gabor function with a bar. Thus, the first-output-layer neurons 
are indicated by bars that represent the optimal orientation and 
the spatial location of the fitted Gabor function. In the boxes 
on the right of Figure 2A, we plotted the bars corresponding to 
the first-output-layer neurons. The left one shows the weights 
of a second-output-layer neuron, and the right one shows 

the weights of the same neuron. The approximate relation 
Wy ~ W~ holds in this neuron. Other examples are shown in 
Figure 3B. The strongest excitatory and inhibitory connections 
of each neuron are represented by red (RGB[100%, 0%, 0%]) 
and blue (RGB[0%,0%, 100%]), respectively. The other con- 
nections are represented by a paler red (RGB[100%, 100(1 — 
r)%, 100(1 - r)%]) and a paler blue (RGB[100(1 - r)%, 100(1 - 
r)%, 100%]), where r is the ratio of the strength of the con- 
nection to the strongest excitatory or inhibitory connection. 
These figures indicate that the second-output-layer neurons 
tend to receive an input having the same sign from pairs 
of first-output-layer neurons with sign-inverted receptive fields 
and neurons with similar orientation preferences in a small 
region. 

Figure 4 shows the response properties of second-output- 
layer neurons. The image patches presented to the network as 
the input are shown on the top of the panels. The output of 
the neuron in Figure 2A in response to the phase-shifted grat- 
ings is shown in Figure 4A. This figure shows that the output 
is positive for all phases, i.e., this neuron is less sensitive to 
the phase of the gratings. Insensitivity to the phase of the grat- 
ings is a feature of complex cells. Previous experiments have 
characterized simple and complex cells by measuring the rela- 
tive modulation or phase-dependent (Fl) to phase-invariant (F0) 
component ratio (F1/F0 ratio) in their responses to the opti- 
mal gratings (Shapley and Lennie, 1985; Skottun et al., 1991). 
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FIGURE 3 | Connection weights from the first-output-layer neurons to 
the second-output-layer neurons in Model 1. (A) Each dot corresponds 
to the connection weights from two first-output-layer neurons yf and yj~ 
with sign-inverted receptive fields to the second-output-layer neuron z,-. 
(B) Connection weights from the first-output-layer neurons to 15 
second-output-layer neurons. 



Neurons with the F1/F0 ratio greater than 1 are identified as 
simple cells, whereas those with the F1/F0 ratio less than 1 
are identified as complex cells. The F1/F0 ratio of the neuron 
in Figure 2A is 0.02, which suggests that this cell should be 
classified as a complex cell. This phase insensitivity is a result 
of the convergence of the connections having the same sign 
from two first-output-layer neurons with sign-inverted recep- 
tive fields. If a second-output-layer neuron receives connection 
weights having the same size from each pair of first-output-layer 
neurons with sign-inverted receptive fields, the sign inversion 
of the input does not change the output of this second-output- 
layer neuron. In addition, the convergence of connections having 
the same sign from the first-output-layer neurons with similar 
orientation preferences facilitates the phase insensitivity of the 
second-output-layer neuron. The black bars in Figure 4B rep- 
resent the histogram of the F1/F0 ratio of second-output-layer 
neurons. This histogram shows that almost all of them are clas- 
sified as complex cells. In contrast, most of the model neurons 
with randomly shuffled connection weights exhibit F1/F0 ratios 
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FIGURE 4 | Properties of second-output-layer neurons in Model 1. 

(A) Modulation of the output of the second-output-layer neuron whose 
connection weights are shown in Figure 2A by the phase of the gratings 
presented on the top. (B) Histogram of the F1/F0 ratio of the model 
neurons after the learning process (black) and the model neurons whose 
connection weights are randomly shuffled (gray). (C) Change in the output 
of the same neuron by the radius of the gratings. (D) Distribution of the 
output of 400 second-output-layer neurons in response to the optimal 
gratings with the preferred radius and to the same gratings whose radius is 
enlarged by 6 pixels. (E) Modulation of the output of the same neuron by 
the relative orientation of the gratings of the outer annulus to the inner 
patch. The dashed line indicates the output of the neuron in response to 
the stimulus with the grating of the inner patch only. 



greater than 1 (Figure 4B, gray bars). Thus, the F1/F0 ratio close 
to zero is not a result of the multilayer structure because the 
second-output-layer neurons in the network with random first- 
to second-output-layer connections do not exhibit the F1/F0 ratio 
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close to zero. On the contrary, this is a result of the informa- 
tion maximization in the multilayer network and the resultant 
approximate relation Wq ^ Wj7 . These results suggest that the 
phase insensitivity of complex cells originates from an efficient 
encoding of the visual input. However, F1/F0 ratios of most model 
neurons are much smaller than experimentally obtained values. 
It is reported that a substantial proportion of complex cells have 
F1/F0 ratios greater than 0.5 (Skottun et al., 1991). This discrep- 
ancy may suggest that the simple and complex cells in VI are not 
as clearly segregated as the first- and second-output-layer neurons 
in Model 1. 

Stimuli presented in the silent receptive field surrounds can 
modulate the response of the cells in VI to the stimuli pre- 
sented in the classical receptive field (Jones et al, 2002). In 
most of the cells, the suppression is greatest when the orienta- 
tion of the gratings in the silent surrounds is the same as the 
optimal orientation of the classical receptive field. The model 
neuron shown in Figure 2A is suppressed when the radius of the 
grating is increased (Figure 4C). Inhibitory connections play an 
important role in this suppression. Strong inhibitory connections 
originate from the first-output-layer neurons with similar orien- 
tation preferences as the first-output-layer neurons with strong 
excitatory connections. In this neuron, excitatory and inhibitory 
areas are separated from each other (Figure 2A). The optimal 
grating (the leftmost grating of Figure 4C) does not cover the 
inhibitory area. To systematically examine the suppression of 
the second-output-layer neurons, we first choose the optimal 
disk of gratings with the preferred radius for each neuron, and 
then enlarge them by 6 pixels. Figure 4D shows that enlarging 
the radius of the optimal gratings decreases the output of most 
of the second-output-layer neurons. The activity of a second- 
output-layer neuron is large if the presented grating is restricted 
to the area of the receptive fields of first-output-layer neurons 
with excitatory connections, whereas the activity diminishes if 
the presented grating covers the receptive fields of first-output- 
layer neurons with inhibitory connections. Jones et al. (2002) 
reported that iso-orientation suppression was found in 94% of 
the VI cells. Similarly, almost all of the second-output-layer neu- 
rons (396/400) in Model 1 exhibit the surround suppression. 
Figure 4E shows the response of the same neuron as in Figure 4C 
to the surrounding annulus for various orientations in the pres- 
ence of the inner patch of grating at its preferred orientation. 
When the orientation of the grating of the outer annulus devi- 
ates from the preferred orientation, the degree of suppression 
diminishes. This type of response, which is classified as iso- 
orientation suppression (Jones et al., 2002), is a result of the 
convergence of the excitatory and inhibitory connections from 
first-output-layer neurons with similar orientation preferences. 
The first-output-layer neurons with inhibitory connections cor- 
respond to the silent surroundings outside the classical receptive 
field of complex cells. Because the surrounding gratings perpen- 
dicular to the center grating suppress the response, this model 
neuron is classified into the group called "mixed general sup- 
pression and orientation alignment suppression" in Jones et al. 
(2002). The responses to the stimulus with the surrounding grat- 
ing perpendicular to the center grating are greater than that with 
the surrounding grating parallel to the center in 376 out of 400 



second-output-layer neurons (p < 0.01, binomial test). However, 
the responses to the stimulus with the surrounding grating per- 
pendicular to the center grating are greater than that to the 
center stimulus alone in only 175 second-output-layer neurons. 
Jones et al. (2002) reported that 63% of neurons in VI exhib- 
ited cross-orientation facilitation. Model 1 fails to predict the 
result. 

These results suggest that the second-output-layer neurons 
acquire phase insensitivity and complex cell-like receptive field 
properties. However, the simulation of Model 1 is computa- 
tionally expensive, and the learning is very slow to converge. 
Therefore, we examine the receptive field properties of the neu- 
rons in a simplified model. 

3.2. MODEL 2: INFORMATION MAXIMIZATION IN A TWO-LAYER 
NETWORK WITH SIGN-INVARIANT INPUT 

Connection weights and take on close values in the 
three-layer network of Model 1 after learning. To examine 
why this relation holds, we constructed another model net- 
work. In Model 2, we maximize the entropy of outputs u and 
z (Figure 1B1). Here m, is identical to the nonlinear trans- 
formation of independent component i and corresponds to a 
first-output-layer neuron in Model 1. Output z corresponds to 
second-output-layer neurons in Model 1, and is connected from 
y + and y~ with connection matrices W + and W , respec- 
tively. We assume that the probability density functions of the 
outputs of first-output-layer neurons are even functions, and 
that the signs of these outputs are not correlated. This assump- 
tion is supported by the observation that less than 1% of 
the 2x2 contingency tables of the signs of two first-output- 
layer neurons contain entries that are greater than 0.26 or less 
than 0.24. Using this assumption, a simple calculation proves 
that the entropy of the output is maximized if = W^j 

(see Methods). Assuming that W»~ = Wjj = W y , Equation 54 
leads to 

!H=fi-hi+ %l M il) ^ 

\ 1<;<N / 

(Figure 1B2). In this maximization, we use an ICA algorithm 
with N input units, |u, |, and N output units, z,. Model 2 reduces 
the computational load and accelerates the convergence of the 
learning. 

Figure 5 shows the connection weights of 100 output neurons 
after learning. Because we assume that = in this model, 
the connection weights to a second-output-layer neuron can be 
represented by a single box (Figure 2B). The inputs to the second 
output-layer neurons have sparse distributions, whose kurtosis 
is greater than 3, with the exception of only one model neuron. 
None of the outputs of first-output-layer neurons, «,-, have kur- 
tosis greater than 3. The receptive fields for more than one third 
of the output neurons are similar to that of the output neuron 
shown in Figure 2A. Some of them have excitatory and inhibitory 
connections from input neurons with similar orientation prefer- 
ences, e.g., the third neuron from the right in the second row. 
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The connection weights from the input neurons with the same 
orientation selectivity in distant areas of the image patches tend 
to have inverted signs. Figure 6A shows that this type of neuron is 
suppressed if a larger grating is presented, in a way similar to that 
shown in Figure 4C. Figure 6B shows that enlarging the radius of 
the optimal gratings decreases the output of most neurons. 393 
out of 400 neurons of Model 2 also exhibit surround suppres- 
sion. The neuron in Figure 6C is also suppressed by enlarging the 
radius of the gratings. However, the orientation preference and 
spatial alignment of input neurons sending strong inputs to the 
neuron in Figure 6C are different from those in Figure 6A. The 
neuron in Figure 6A is suppressed by parallel bars; in contrast, the 
neuron in Figure 6C is suppressed by a long bar. Thus, the neu- 
ron in Figure 6C is an end-stopped neuron (Hubel and Wiesel, 
1968), whereas the neuron in Figure 6A is not. Some neurons 
have much more complex receptive fields and different response 
properties. Figure 6D shows that the activity of a second-output- 
layer neuron is suppressed if a grating perpendicular to the 
preferred orientation of this neuron is superimposed on the cen- 
ter of the receptive field of the neuron. This type of suppression 
was reported in complex cells (Bonds, 1989). This neuron has 
excitatory and inhibitory connections from input neurons with 
orientation selectivity perpendicular to the preferred orientation. 
In this type of neuron, the connections from the input neu- 
rons with orientation selectivity perpendicular to each other in 
the same area of the image patches tend to have inverted signs. 
This type of suppression is found in 264 out of 400 second- 
output-layer neurons (p < 0.01, binomial test). Figure 6E shows 
an example of the response classified as cross-orientation facilita- 
tion (Jones et al., 2002). This neuron is suppressed if the radius 
of the optimal grating is enlarged in a manner similar to that 
shown in Figures 4C, 6A. When the orientation of the grating 
of the outer annulus deviates from that of the inner patch, the 
response exceeds the response to the optimal grating only (shown 
by the dashed line). In this type of neuron, the connections from 
the input neurons with orientation selectivity perpendicular to 
each other in distant areas of the image patches tend to have 
the same sign. This type of receptive field configuration facil- 
itates the response of the neuron when the orientation of the 
outer annulus is perpendicular to the preferred orientation of the 
neuron. This type of facilitation is found in 369 out of 400 second- 
output-layer neurons (p < 0.01, binomial test). The responses to 



the stimulus with the surrounding grating perpendicular to the 
center grating are greater than that to the center stimulus alone 
in 149 second-output-layer neurons only. Model 2 also fails to 
predict the experimental result of cross-orientation facilitation 
(Jones et al., 2002). In some other neurons, the orientation prefer- 
ence is unclear (Figure 5). However, these neurons receive strong 
excitatory and inhibitory connections from input neurons whose 
preferred positions are restricted to a small area. Thus, these 
neurons respond selectively to edges in these areas. 

Ringach et al. (2002) reported that the circular variance 
defined by 

|£ 9 £fe(e))exp(2ie)| 



1 



(56) 



where 6 is the orientation of the grating and z,(6) is the out- 
put of neuron i in response to the grating, tends to be greater 
than 0.5 for complex cells. Figure 6F shows that the circular vari- 
ance of output neurons of Model 2 tends to be greater than 
0.5, which is consistent with experimental results of complex 
cells. 

4. DISCUSSION 

Our models differ from previous models in several ways. First, to 
generate sign-insensitive complex cells, Model 1 does not require 
inputs to be insensitive to the signs of pixels. The model in 
Hyvarinen and Hoyer (2001) assumes that complex cells receive 
the square of the output of simple cells. Complex cells in the 
model by Berkes and Wiskott (2005) are insensitive to the sign 
of image patches because their output is given by the degree 
two polynomials of pixel intensities. In the model by Shan et al. 
(2007), the output of simple cells is transformed to the absolute 
value and subjected to a nonlinear transformation. The models by 
Karklin and Lewicki (2005, 2009) also ignore the sign of the out- 
put of simple cells because the output of complex cells in these 
models depends only on the variance of the simple cells. In con- 
trast, Model 1 can generate sign-insensitive complex cells without 
assuming sign-insensitive inputs to complex cells. The informa- 
tion maximization principle gives a possible explanation as to why 
sign-insensitive complex cells arise from sign-sensitive simple cell 
inputs. The results of Model 1 justify the assumption of Model 
2 that the output neurons receive the absolute values of the out- 
put of simple cell-like neurons as inputs. Second, our models can 
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FIGURE 6 | Properties of output neurons in Model 2. (A) Change in the 
output of a neuron by the radius of the gratings. (B) Distribution of the 
output of 400 neurons in response to the optimal gratings with the 
preferred radius and to the same gratings whose radius is enlarged by 6 
pixels. (C) End stopping exhibited by a neuron changing its output 

(Continued) 



FIGURE 6 | Continued 

depending on the radius of the gratings. (D) Modification of the output of a 
neuron by the relative orientation of the gratings superimposed on the 
optimal grating. The dashed line indicates the output of the neuron in 
response to the optimal grating only. (E) Modulation of the output of a 
neuron by the orientation of the gratings of the outer annulus. The dashed 
line indicates the output of the neuron in response to the stimulus with the 
grating of the inner patch only. (F) Histogram of circular variance of 
orientation tuning of output neurons. 



predict the receptive fields exhibiting surround suppression and 
facilitation. The complex cell-like receptive fields produced by the 
models of Foldiak (1991) and Hyvarinen and Hoyer (2001) are 
devoid of these properties. The models of Foldiak (1991) and 
Berkes and Wiskott (2005) also differ from our models in that 
their models require time-varying sequences of image patches 
as inputs. Although these models give results similar to our 
models, none of these models are based on the information max- 
imization principle, but they are based on other optimization 
criteria such as the parameter fitting of the probability distri- 
butions and the minimization of the temporal change of the 
response. 

In Model 2, we transform the i-th independent component, 
a„ to \uj\ = \f(a,)\. Shan et al. (2007) used a nonlinear function 
that transforms the absolute value of the independent compo- 
nents into a standard normal distribution. In Model 2, |u,| is 
almost uniformly distributed rather than normally distributed 
because the entropy of / (a,) is maximized when /(a;) is uni- 
formly distributed in the range of the bounded function f(x). 
Our results show that a uniformly distributed input can form 
complex cell-like receptive fields. The model proposed by Karklin 
and Lewicki (2005) also resembles Model 2. Each output unit of 
their model detects a specific set of covariances among input vari- 
ables. For example, an output becomes large when and \X2\ 
are large, and another output becomes large when \x\ | is large 
and |%2 1 is small. The linear superposition of |m;| plays a similar 
role in Model 2. A large |u, | indicates that the absolute value of 
the i-th simple cell-like input is large, and a small |w,| indicates 
that the absolute value of the i-th simple cell-like input is small. 
Their model requires the sparseness of the output distribution, 
which is also required in Model 2, because the linear ICA algo- 
rithms can be derived by assuming the sparseness of the source 
distribution. 

The nonlinearity of VI complex cells was studied by using 
the models with the pooling of simple cell-like units (Sakai and 
Tanaka, 2000; Martinez and Alonso, 2001). We attempted to 
examine the second-order nonlinearity of the second-output- 
layer neurons. However, reverse correlation of these neurons 
did not exhibit second-order Wiener-like kernels that are simi- 
lar to those observed for complex cells (Szulborski and Palmer, 
1990). This is presumably because the widths of on- and off- 
regions of most first-output-layer neurons are as narrow as 
1 pixel. 

From our study, we speculate that complex features can be 
detected by combining the inputs from a large number of sim- 
pler elements. By increasing the number of output layers in our 
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models, we will have higher-order neurons that are selective to 
more complex features. The information maximization of a mul- 
tilayer network with higher-order neurons would be capable of 
explaining the complicated selectivity of neurons in higher visual 
areas. 

However, there are some limitations to our models, such as 
the fact that our models use feedforward networks. Although this 
structure simplifies the models and facilitates the derivation of 
the learning rules, cortical networks have feedback and recur- 
rent structures as well as feedforward structures. It is known that 
the feedback from higher to lower sensory areas plays an essen- 
tial role in sensory cortices. Bardy et al. (2006) reported that the 
inactivation of the feedback from the higher visual areas affected 
the selectivity of the neurons in VI. They found that this inac- 
tivation changed the responses of a substantial proportion of 
neurons classified as complex cells to simple cell-like responses, 
indicating that the feedback from higher visual areas modifies 
the receptive fields of complex cells. The response of complex 
cells appears to be formed by both a feedforward mechanism 
and a feedback and recurrent mechanism. Simple cells, which are 
assumed to provide inputs to complex cells in this paper, receive 
recurrent connections so that they exhibit surround suppression 
(Burr et al, 1981; Walker et al., 2000). Ringach et al. (2002) 
showed that simple cells with odd-symmetric receptive fields in 
VI are greater in number than those predicted from ICA mod- 
els. This may be the reason for which reverse correlation of the 
second-output-layer neurons did not exhibit Wiener-like kernels 
that are similar to those observed for complex cells. Although the 
receptive fields of some second-output-layer neurons in our mod- 
els are irregular and different from model neurons with typical 
complex cell-like receptive fields, the introduction of higher- 
order neurons and recurrent connections (Tanaka et al, 2009) 
could cause the first-output-layer neurons to exhibit properties 
that are more similar to simple cells; in turn, it may increase 
the number of model neurons with complex cell-like receptive 
fields. 
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